125 research outputs found
Multimodal Observation and Interpretation of Subjects Engaged in Problem Solving
In this paper we present the first results of a pilot experiment in the
capture and interpretation of multimodal signals of human experts engaged in
solving challenging chess problems. Our goal is to investigate the extent to
which observations of eye-gaze, posture, emotion and other physiological
signals can be used to model the cognitive state of subjects, and to explore
the integration of multiple sensor modalities to improve the reliability of
detection of human displays of awareness and emotion. We observed chess players
engaged in problems of increasing difficulty while recording their behavior.
Such recordings can be used to estimate a participant's awareness of the
current situation and to predict ability to respond effectively to challenging
situations. Results show that a multimodal approach is more accurate than a
unimodal one. By combining body posture, visual attention and emotion, the
multimodal approach can reach up to 93% of accuracy when determining player's
chess expertise while unimodal approach reaches 86%. Finally this experiment
validates the use of our equipment as a general and reproducible tool for the
study of participants engaged in screen-based interaction and/or problem
solving
Deep learning investigation for chess player attention prediction using eye-tracking and game data
This article reports on an investigation of the use of convolutional neural
networks to predict the visual attention of chess players. The visual attention
model described in this article has been created to generate saliency maps that
capture hierarchical and spatial features of chessboard, in order to predict
the probability fixation for individual pixels Using a skip-layer architecture
of an autoencoder, with a unified decoder, we are able to use multiscale
features to predict saliency of part of the board at different scales, showing
multiple relations between pieces. We have used scan path and fixation data
from players engaged in solving chess problems, to compute 6600 saliency maps
associated to the corresponding chess piece configurations. This corpus is
completed with synthetically generated data from actual games gathered from an
online chess platform. Experiments realized using both scan-paths from chess
players and the CAT2000 saliency dataset of natural images, highlights several
results. Deep features, pretrained on natural images, were found to be helpful
in training visual attention prediction for chess. The proposed neural network
architecture is able to generate meaningful saliency maps on unseen chess
configurations with good scores on standard metrics. This work provides a
baseline for future work on visual attention prediction in similar contexts
SocialInteractionGAN: Multi-person Interaction Sequence Generation
Prediction of human actions in social interactions has important applications
in the design of social robots or artificial avatars. In this paper, we model
human interaction generation as a discrete multi-sequence generation problem
and present SocialInteractionGAN, a novel adversarial architecture for
conditional interaction generation. Our model builds on a recurrent
encoder-decoder generator network and a dual-stream discriminator. This
architecture allows the discriminator to jointly assess the realism of
interactions and that of individual action sequences. Within each stream a
recurrent network operating on short subsequences endows the output signal with
local assessments, better guiding the forthcoming generation. Crucially,
contextual information on interacting participants is shared among agents and
reinjected in both the generation and the discriminator evaluation processes.
We show that the proposed SocialInteractionGAN succeeds in producing high
realism action sequences of interacting people, comparing favorably to a
diversity of recurrent and convolutional discriminator baselines. Evaluations
are conducted using modified Inception Score and Fr{\'e}chet Inception Distance
metrics, that we specifically design for discrete sequential generated data.
The distribution of generated sequences is shown to approach closely that of
real data. In particular our model properly learns the dynamics of interaction
sequences, while exploiting the full range of actions
A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation
Animating still face images with deep generative models using a speech input
signal is an active research topic and has seen important recent progress.
However, much of the effort has been put into lip syncing and rendering quality
while the generation of natural head motion, let alone the audio-visual
correlation between head motion and speech, has often been neglected. In this
work, we propose a multi-scale audio-visual synchrony loss and a multi-scale
autoregressive GAN to better handle short and long-term correlation between
speech and the dynamics of the head and lips. In particular, we train a stack
of syncer models on multimodal input pyramids and use these models as guidance
in a multi-scale generator network to produce audio-aligned motion unfolding
over diverse time scales. Our generator operates in the facial landmark domain,
which is a standard low-dimensional head representation. The experiments show
significant improvements over the state of the art in head motion dynamics
quality and in multi-scale audio-visual synchrony both in the landmark domain
and in the image domain
Perceptive Services Composition using semantic language and distributed knowledge
International audienceBuilding applications composing perceptive services in a pervasive environment can lead to an inextricable problem: they were built by several people, using different programming languages and multiple conventions and protocols. Moreover, services can be volatile, so appear or disappear during running time of the application. This paper proposes the use of a dedicated human-readable semantic language to describe perceptive services. After converting this description into a more common language, one can recruit services using inference engines to build complex applications. In order to increase robustness of the whole system, descriptions of services are distributed over the network using a crosslanguage crossplateform open-source middleware of our own called OMiSCID
Multi-Sensors Engagement Detection with a Robot Companion in a Home Environment
Workshop FW1 "Assistance and Service Robotics in a Human Environment" - Session3: Behavioral modeling and Human/Robot InteractionInternational audienceRecognition of intentions is an unconscious cognitive process vital to human communication. This skill enables anticipation and increases interactive exchanges quality between humans. Within the context of engagement, i.e. intention for interaction, non-verbal signals are used to communicate this intention to the partner. In this paper, we investigated methods to detect these signals in order to allow a robot to know when it is about to be addressed. Classically, the human position and speed, the human-robot distance are used to detect the engagement. Our hypothesis is that this method is not enough in a context of a home environment. The chosen approach integrates multimodal features gathered using a robot enhanced with a Kinect. The evaluation of this new method of detection on our corpus collected in spontaneous conditions highlights its robustness and validates use of such technique in real environment. Experimental validation shows that the use of multimodal sensors gives better precision and recall than the detector using only spatial and speed features. We also demonstrate that 7 multimodal features are sufficient to provide a good engagement detection score
Autoregressive GAN for Semantic Unconditional Head Motion Generation
We address the task of unconditional head motion generation to animate still
human faces in a low-dimensional semantic space.Deviating from talking head
generation conditioned on audio that seldom puts emphasis on realistic head
motions, we devise a GAN-based architecture that allows obtaining rich head
motion sequences while avoiding known caveats associated with GANs.Namely, the
autoregressive generation of incremental outputs ensures smooth trajectories,
while a multi-scale discriminator on input pairs drives generation toward
better handling of high and low frequency signals and less mode collapse.We
demonstrate experimentally the relevance of the proposed architecture and
compare with models that showed state-of-the-art performances on similar tasks
Smartphone-based User Location Tracking in Indoor Environment
International audienceThis paper introduces our work in the framework of Track 3 of the IPIN 2016 Indoor Localization Competition, which addresses the smartphone-based tracking problem in an offline manner.Our approach splits the path-reconstruction into several smaller tasks, including building identification, floor identification, user direction and speed inference.For each task, a specific set of data from the provided log data is used.Evaluation is carried out using a cross validation scheme.To produce the robustness again noisy data, we combine several approaches into one on the basis of their testing results.By testing on the provided training data, we have a good accuracy on building and floor identification. For the task of tracking the user's position within the floor, the result is 10m at 3rd-quarter distance error after 3 minutes of walking
Autonomous Robot Controller Using Bitwise GIBBS Sampling
International audienceIn the present paper we describe a bio-inspired non von Neumann controller for a simple sensorimotor robotic system. This controller uses a bitwise version of the Gibbs sampling algorithm to select commands so the robot can adapt its course of action and avoid perceived obstacles in the environment. The VHDL specification of the circuit implementation of this controller is based on stochastic computation to perform Bayesian inference at a low energy cost. We show that the proposed unconventional architecture allows to successfully carry out the obstacle avoidance task and to address scalability issues observed in previous works
- …